Please indicate:
Consider the breast cancer from HW02. Plot, using the ww shapefiles provided in Lecture 20, three maps:
Notes:
readOGR() function from Lectures 23 and 25.It’s hard to see exactly what’s going on in the smaller census tracts, so let’s try zooming in.
Both plots indicate that the incident rate of breast cancer is somewhere betweeen 0.5% to 2.5%, though we do see one outlier with a rate of about 4%. Additionally, there doesn’t seem to be any clear pattern between rural areas and the more metropolitan areas around Seattle and Tacoma.
And another quick zoom to make sure we don’t lose the little guys.
Interestingly, the image shows that most of the wealthiest areas are just outside of the cities, not actually in the cities themselves.
We will make our metric the product of the incident cancer rate and the median household income. The higher values will be census tracts in which either the cancer rate is higher than usual, and/or the median income is larger than usual.
Our image shows brighter spots in the wealthier areas, suggesting that wealthier communities tend to have higher rates of breast cancer.
It’s also worth taking a look at a plot of the two variables
## Warning: Removed 711 rows containing non-finite values (stat_boxplot).
There does appear to be a slightly increasing trend, indicating that as the median household income increases, so does the incident rate of breast cancer. We can find the difference using an ANOVA test. We can set up a quick hypothesis test:
\(H_0\): There is no increase between cancer rate and income quantiles.
\(H_A\): There is an increase between cancer rate and income quantiles.
trend <- aov(incidence ~ factor(income_quantile), data = census)
summary(trend)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(income_quantile) 4 0.000203 5.087e-05 6.971 1.59e-05 ***
## Residuals 881 0.006430 7.300e-06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 711 observations deleted due to missingness
With a p-val less than 0.05, we reject \(H_0\) in favor of \(H_A\). This means our data suggests that there is a statistically significant difference for the cancer rate, just a few fractions of a percent, between each income quantile.
Download the results of the 2000 election from the School of Public Affairs at American University in DC and create two maps involving only the lower 48 states that show:
where
Then answer the following questions:
Notes:
scale_fill_gradient2(name="", low="blue", high="red", mid="purple") for the appropriate “mid” point. See the ggplot2 webpage for this command for inspiration.# This function eliminates all non-alphanumeric characters and spaces and
# converts all text to lower case.
clean.text <- function(text){
text <- gsub("[^[:alnum:]]", "", text)
text <- gsub(" ", "", text)
text <- tolower(text)
return(text)
}
# State and county map of US in 2010
US.state <- map_data("state") %>% tbl_df()
US.county <- map_data("county") %>% tbl_df()
US.county$subregion <- clean.text(US.county$subregion)
ggplot(US.county, aes(x=long, y=lat, group=group)) +
geom_polygon(fill="white") +
geom_path(col="black", size=0.01) +
coord_map()
The Chief of the Portland Police is tired of reading through pages of crime reports and wants an interactive tool to visualize where different crimes occured during the years 2004 and 2013. Obtain crime data for Portland for years 2004 through 2013 from the CivicApps site, create (in a separate .Rmd file) an appropriate Shiny app, and publish it online. Post the hyperlink here.
Your Shiny app should take in two inputs. Think carefully which is the best way to have users input these:
Using this app, answer the following questions: